Skip to content

DOCS-4086: code samples for building a good dataset #4413

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

nathan-contino
Copy link
Member

@nathan-contino nathan-contino commented Jun 25, 2025

  • adds SDK code samples for Python, Go, Typescript, and Flutter for each step of dataset creation
  • split across multiple pages because create a dataset page was already too long
  • split leverage ai into two sections because it's already too full (and this guide is part of an effort to break it up anyway)

Most non-SDK content is repurposed from the existing 'create a dataset' page.

Apologies for the large line-changed count; hard to avoid when you're splitting up pages and creating examples across multiple languages.

Copy link

netlify bot commented Jun 25, 2025

Deploy Preview for viam-docs ready!

Name Link
🔨 Latest commit 1873f7f
🔍 Latest deploy log https://app.netlify.com/projects/viam-docs/deploys/68641389f5c51e00088db8d9
😎 Deploy Preview https://deploy-preview-4413--viam-docs.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.
Lighthouse
Lighthouse
1 paths audited
Performance: 40 (🔴 down 19 from production)
Accessibility: 100 (no change from production)
Best Practices: 100 (no change from production)
SEO: 92 (no change from production)
PWA: 70 (no change from production)
View the detailed breakdown and full score reports

To edit notification comments on pull requests, go to your Netlify project configuration.

@viambot viambot added the safe to build This pull request is marked safe to build from a trusted zone label Jun 25, 2025
Copy link
Collaborator

@JessamyT JessamyT left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some of my comments are probably unintentionally out-of-diff on content that was repurposed; apologies. Did not review code in any detail since I assume that's all tested. Also GitHub seems to be bugging so will end review now lest comments not actually show up. LMK if I should provide code review (to the extent that I'm qualified to do so :D )

Copy link
Collaborator

@npentrel npentrel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some mostly minor feedback - overall great direction.
Only bigger feedback I have is that this does create quite a lot of new pages. I am not entirely convinced we need quite that many. Creating a dataset, for example, is fairly short, should that just be an include and part of some of the other pages? Will need to think more about that.

Copy link
Collaborator

@JessamyT JessamyT left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The flow is hard because there are so many ways to do each step, and some of them can happen in any order (e.g. add to dataset then annotate or vice versa?), not just one linear path. The true flow chart is a pile of strands of spaghetti that each fork off into multiple ends. So I guess this is a plausible way and I don't currently have a better suggestion for how to present the path(s).

Noticed a couple more things; commented. Generally not blocking except maybe get image vs get images discrepancy in code samples?

Co-authored-by: Naomi Pentrel <[email protected]>
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't love the cards there because it goes against the flow that the "next"-buttons try to suggest. A sentence with links to the steps is less confusing I think because there's fewer boxes.

I have an alternate suggestion, why don't we do:

  1. Capture and annotate images
  2. Create a training dataset (which includes adding to a training dataset)

capture and annotate could be separate but I feel like that might make create a training set less awkward?

{{% /tab %}}
{{< /tabs >}}

## Capture, annotate, and add images to a dataset
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is still very odd to me. Like it essentially does both capture + annotate (which is the next page) and adding to dataset when we've split those across three pages. Either they're together and we have them in one page or this doesn't make sense

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this snippet was inspired by the wine-pouring demo example linked to me as a good pattern, so i wanted to find a place for it. if you feel it doesn't fit into the flow of the pages as-is, would you rather i:

  • removed this example entirely?
  • rearranged the pages, perhaps combining add and annotate?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reorganized according to your other comment; hopefully that helps!

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yep this lgtm. I think keeping is good thought maintenance will be painful so future us might disagree

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

noting to self that we can significantly simplify this code when uploadfiletodataset happens


## Classify images with tags

Classification determines a descriptive tag or set of tags for an image.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

an image here might be great and more immediatly convey this but that doesn't need to necessarily happen with this PR

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll brainstorm some ideas for this and submit a request


{{< alert title="Tip" color="tip" >}}

Unless you already have an ML model that can generate tags for your dataset, use the Web UI to annotate.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is confusing. Because it's sort of begs the question - so if I do have a model, what then? So it should link to that code. which maybe means that this should go to the annotate page: https://deploy-preview-4413--viam-docs.netlify.app/data-ai/train/update-dataset/#capture-annotate-and-add-images-to-a-dataset

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I misunderstand you, isn't this already on the annotate page? I put this admonition here to help guide people into the appropriate tab in the tabset that follows. Where are you thinking we could link?

Regardless I reworded this away from the question-begging 'unless', but me know if I'm missing something else.

Copy link
Collaborator

@JessamyT JessamyT left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry but latest changes are also a bit confusing....suggested a possible product change 😬

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right now there is still capture and add to dataset content on two different pages which feels confusing.
It's hard to force a linear flow here since with all the different options for each step, the same order doesn't always apply:

  • In most cases, like using the mobile app, uploading a batch, or adding existing images, you can get the data first and then create a dataset and add the data to it. Order doesn't matter, but it's easiest conceptually to think of getting data and then making a dataset with it.
  • One exception: If you capture individual images through the UI and add them to a dataset on the spot, you have to have a dataset to add them to before you start capturing.
  • In the script version of that same "Capture individual images" heading, it looks like you don't specify a dataset id, so you'd still have to add to a dataset later like with the other methods.

Suggestion:

  • Implement Naomi's order suggestion, and:
  • Get rid of the exception: Ask eng to change that capture button to not save to dataset but rather just save the image to your captured data. Just one single click, so you can capture more images in rapid succession, then add a batch all at once later. This would be a less clunky UX IMO, and also solve a docs flow problem.
    • If this will take a while but they'll do it, don't worry about this flow for now; document the rest per Naomi's order
  • If this can't/won't ever be changed in eng, document this as a thing you can do but shape the docs around the normal capture-then-make-a-dataset order

@viambot
Copy link
Member

viambot commented Jul 1, 2025

It looks like the following files may have been renamed. Please ensure you set all needed aliases:
rename docs/data-ai/{ai/advanced => train}/_index.md (37%) rename docs/data-ai/{ai => train}/train-tflite.md (86%) rename docs/data-ai/{ai => train}/train.md (99%) rename docs/data-ai/{ai/advanced => train}/upload-external-data.md (95%)

@nathan-contino nathan-contino merged commit 06c1acb into viamrobotics:main Jul 1, 2025
12 checks passed
@nathan-contino nathan-contino deleted the DOCS-4086-build-good-dataset branch July 1, 2025 17:00
Copy link

github-actions bot commented Jul 1, 2025

🔎💬 Inkeep AI search and chat service is syncing content for source 'Viam Docs'

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
safe to build This pull request is marked safe to build from a trusted zone
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants